The information presented here is placed in the public domain, and was written by Doug Burke. The notebook used to create this page is available (empty or filled), and questions can be asked using the GitHub issues page or via Twitter: https://twitter.com/doug_burke.
It will probably make a lot-more sense if you have already read the previous entries in this "series".
Today I'm going to read in some Astronomy data and plot it. This will require manually parsing a FITS (Flexible Image Transport System) binary table file$^\dagger$; it was going to be more automated, but the post was getting too long so I decided to leave that part for another day and just stick with the manual code below (one of my current pet ideas is to provide a simple FITS back end to the Frames library for reading in data from FITS files, so maybe I'll write something up soon if I get time to work on it).
I have updated to the development version of IHaskell
, rather than the version on
Hackage, to see if it fixed some issues I was seeing
(which it did). One of the changes this has made is to jump to IPython 3.0 - a.k.a. the
Jupyter project - which will hopefully not be a problem for
people playing along at home.
$^\dagger$ We normally use the acronym - i.e. FITS - and forget about the fact that we are using what is ostensibly an image format to read in tabular data.
In [1]:
import Data.Time
getCurrentTime
The file I want to display contains an "effective area" curve for a source observed by the Chandra X-ray Observatory. This curve is known, to X-ray astronomers, as an ARF (Auxiliary Response File)$^\dagger$. I can use one of the tools that we provide as part of CIAO$^\ast$ to display some important information about the file.
$^\dagger$ Wow, a TLA that isn't on Wikipedia!
$^\ast$ We in the Science Data Systems group in the CXC are particularly proud of this contrived acronym, even if we can't distinguish between a language and a dialect.
The dmlist
tool is used to display the structure and contents of the file formats
we use when analyzing X-ray data:
% dmlist src.arf blocks
--------------------------------------------------------------------------------
Dataset: src.arf
--------------------------------------------------------------------------------
Block Name Type Dimensions
--------------------------------------------------------------------------------
Block 1: PRIMARY Null
Block 2: SPECRESP Table 3 cols x 1070 rows
So, there are two "blocks" of data in the file, with the first one being empty. This is actually a consequence of the I in FITS, where the first block$^\ddagger$ can either be empty or contain image data. A result of this is that tabular data is always relagated to the second (or later) block.
We can also see what columns are stored in the table:
% dmlist src.arf cols
--------------------------------------------------------------------------------
Columns for Table Block SPECRESP
--------------------------------------------------------------------------------
ColNo Name Unit Type Range
1 ENERG_LO keV Real4 -Inf:+Inf Min Energy
2 ENERG_HI keV Real4 -Inf:+Inf Max Energy
3 SPECRESP cm**2 Real4 -Inf:+Inf Effective Area
and even see some of the data (which will be useful later, when I want to check that I've actually read things in properly):
% dmlist "src.arf[#row=1:5]" data
--------------------------------------------------------------------------------
Data for Table Block SPECRESP
--------------------------------------------------------------------------------
ROW ENERG_LO ENERG_HI SPECRESP
1 0.30000001192093 0.31000000238419 4.8743429184
2 0.31000000238419 0.31999999284744 14.8292617798
3 0.31999999284744 0.33000001311302 21.3022918701
4 0.33000001311302 0.34000000357628 28.5149517059
5 0.34000000357628 0.34999999403954 35.3988304138
$^\ddagger$ I really should use the term HDU
- which means Header/Data Unit - rather than block,
but in CIAO we tend to use the term block, and as the
Chandra X-ray Center pay my bills I'm sticking with this terminology.
So, how can I read in the data? Well, I happen to know that the data is stored in blocks of text (US ASCII) for the metadata, interspersed by binary blocks for the actual data. More detailed information is available at the FITS primer and the FITS standard will also be required reading (if you are having trouble sleeping). I'll just add a note to say that in Astronomy we use the FITS standard but then layer on more domain-specific standards (occasionally these are "best practices" rather than any real form of a standard) to enhance the data representation (in particular for the metadata). This isn't really important for what I'm about to do, but I thought I'd mention it. There should also be addendums to the standard, since it has changed over time, but they aren't important for today's work.
I'll start by loading in the file as a bytestring,
picking the Char8
variant since the text is in US ASCII format. It would be interesting to do
this via a streaming interface, but for this example that would be overkill.
In [2]:
import qualified Data.ByteString.Char8 as B8
There are several representations for text-like data in Haskell: the language standard defines
the String
type, then there are ByteString
and Text
values which are more efficient (and have
subtly-different use cases). This means that we end up with a bunch of routines
that do the same thing,
and often have the same name, but work on different data types$^\dagger$.
For example, there's a readFile
function in the standard Haskell prelude$^\ddagger$, but
this returns a String
, so I need to use the ByteString version, which I indicate
using the prefix I just defined: B8.readFile
.
$^\dagger$ There's enough semantic differences in operations that a universal "string-like" typeclass hasn't become popular enough in the community.
$^\ddagger$ the prelude is the name for the "default set of symbols" you get to use in Haskell (as it's a module
name I should refer to it as Prelude
).
In [3]:
cts <- B8.readFile "../data/src.arf"
:type cts
Let's see how large the file is
(since this is the
Char8
version of a bytestring, we don't have to worry about non-eight-bit characters):
In [4]:
B8.length cts
Now, FITS files are written in chunks of 2880
bytes - that is, the file is padded with
trailing space characters (or null bytes) if needed - so how many chunks are there?
In [5]:
len = B8.length cts
len / 2880
Argh!!! This type-safe language has stopped me from making a type error and it's annoying! The length function returns an Int
but the division operator /
requires a
Fractional
constraint, and the
error is telling me that integers (or, at least values with a type of Int
), aren't fractional. What I need
is to use functions from the
Integral
typeclass: in this case
I'm going to use
divMod
to return both the number
of blocks and any remainder.
In [6]:
len `divMod` 2880
So, the file contains 12 chunks. As a reminder, or perhaps I haven't mentioned it before, binary functions such as divMod
can be written in
"infix" form by "quoting" there name; that is, the above is equal to
divMod len 2880
I tend to use the infix form if it makes it clearer (to me) what the arguments mean. There are also those
binary functions - such as +
- which are defined as being "infix" so do not need the back ticks;
however, when used in prefix form they must be surrounded by ()
:
In [7]:
1 + 2
(+) 1 2
Since the value 2880
is going to be used a few times below, let's "name" it:
In [8]:
chunk :: Int
chunk = 2880
Now, FITS files start with a Header Unit, that is 2880*n
characters of ASCII text (where n
is an integer),
and these header units are broken up every 80 characters (known as a "card"). So, let's look at the
contents of the first card:
In [9]:
B8.take 80 cts
Each card can be considered to be a keyword/value pair with an optional comment (or description). The values are typed, in the sense that there are strings, booleans, integers, or floats, but there's no syntactic contraint that a given key has a specific type. The layout is pretty simple
name of the keyword in the first 8 characters (which can only be A-Z
, 0-9
, -
or _
; I was under
the impression that the name had to start with A-Z
but this may not be the case,
or it's clarified in some later document)
characters 9 and 10 are "= "
(actually, there's a few informational keywords that don't have
this, but I'm just going to ignore those today, since they are metadata about the metadata)
a string representation of the value: a character string, boolean, integer, floating point, or complex number
an optional description (or comment) which is started with the /
character.
A more detailed description of these elements can be found in the FITS standard.
At this point you should be wondering why Astronomers use a format that requires
them to write their own parser. Well, we do have standard parsers - a recent entrant is the
astropy
Python package and the most-used compiled version
is the
CFITSIO library - but I wanted to
explore some simple parsing in Haskell.
Let's start by splitting up the input into cards, that is, 80-character chunks, using splitAt:
In [10]:
-- As a reminder, type introduces a synonym which can be used to help document a
-- function (in this case, which part of the return is the card and which the
-- remaining input), but it provides no "extra" type safety, since it is
-- perfectly fine to treat a Card as a ByteString (since that is what it is),
-- or vice versa.
--
type Card = B8.ByteString
-- | Given a string, split off the next 80 characters and return the remaining values.
getCard :: B8.ByteString -> (Card, B8.ByteString)
getCard = B8.splitAt 80
Here I've taken advantage of
partial application to define getCard
. I could
have explicitly included the input variable, as in the following definition
getCard bs = B8.splitAt 80 bs
but it's common to see the partially-applied version (one argument for this form is that you focus more on the code because you aren't distracted by variables that aren't actually "doing anything"). This approach is so common that the "standard" Haskell linting tool will complain if it thinks you've not been eta-reducing enough. The notebook runs this linter by default, so we can see an example of this suggestion:
In [11]:
test bs = B8.splitAt 80 bs
where the Eta reduce term comes from the lambda calculus,
and is mentioned in this
StackOverflow question. Note that if I had given test
a signature, as I did with getCard
, then the suggestion would not
have been created (and I'm too lazy to find out why).
As a quick check of getCard
, here's the first card, where I use
pattern matching
to deconstruct the return value
(i.e. separate out the pair), and the '_' syntax tells the compiler that I am not interested in the
second element of this tuple.
In [12]:
(a,_) = getCard cts
a
In [13]:
:type fst
:type snd
Using the fst
function, I can rewrite the above to avoid having to even talk about the second
value returned by getCard
:
In [14]:
fst (getCard cts)
The getCard
function isn't that useful on its own. What I really want is something that splits
every 80 characters, returning an array of values$^\dagger$. For this example I want to use a
sequence
to represent the array (rather than the standard Haskell list), since it should be more efficient
for some of the data access patterns we would want when accessing the file metadata$\ddagger$.
As with string-like operations, containers (such as lists, sequences, and vectors)
have many common names, so I use a qualified import of the Data.Sequence
module
to avoid name clashes. I then explicitly import several symbols that do not clash,
to avoid having to prefix them with "Seq.
", as Seq.|>
and Seq.><
aren't the easiest symbols to read.
$^\dagger$ There is the Haskell split package which provides many routines for splitting up lists, but it's also easy to write your own, as I do here.
$^\ddagger$ This is why I also import the symbols ViewR
, |>
, ><
, and viewr
, which involve
creating and breaking-up sequences.
In [15]:
import qualified Data.Sequence as Seq
import Data.Sequence (ViewR(..), (|>), (><), viewr)
I want a function that will take a bytestring, split off the first 80 characters, and then call itself
on the remaining string (in Haskell it's common to use recursion when you'd use the equivalent of a
Python for
loop). Here's an initial version which uses getCard
to break down the input, stopping when it is empty. The syntax
acc |> card
adds$^\dagger$ card
onto the end of the sequence acc
, so the go
step adds on the
current card to the end of the accumulator (the first argument) until the input string is
empty, at which time the sequence is returned.
$^\dagger$ Actually, it creates a new sequence - formed by combining the inputs - rather than modifying
the contents of the card
sequence. That is, it is not Python's acc.append(card)
but
more like newacc = acc[:]; newacc.append(card)
, but without the need to actually copy data, due to
the way data is "shared" in Haskell.
In [16]:
getCards1 :: B8.ByteString -> Seq.Seq Card
getCards1 bs = go Seq.empty bs
where
go acc xs | B8.null xs = acc
| otherwise = let (card,todo) = getCard xs
newacc = acc |> card
in go newacc todo
The function above introduces guards, that is the syntax
name args | conditional1 = value1
| conditional2 = value2
...
| otherwise = valueN
The terms to the right of the |
character (i.e. conditonal1
...) are the guards, and act like a giant if statement, in that they are checked for from top to bottom, with the first condition that evaluates to True
"winning" (the otherwise
symbol is just another name for True
, so
it acts like the default
label in C-style case statements).
This is an example of a routine for which "eta reduction" makes sense, in that the bs
argument sent in to getCards1
is only used in the initial call to
go
, but I left it in for clarity (and because I'm about to show a different version
which uses the era-reduced form!).
In this new version, I take advantage of the
unfold pattern
(postscript version) - which
is available for most Haskell container data types - to
deal with the building of the sequence.
Here I use
Seq.unfoldr
,
which is given a routine that
deconstructs the bytestring (i.e. it returns the next element and the remainder of the bytestring).
This sounds a lot like getCard
, although it's a slightly-modified version, as shown below, due
to how unfoldr
is defined:
In [17]:
getCards :: B8.ByteString -> Seq.Seq Card
getCards = Seq.unfoldr go
where
go bs | B8.null bs = Nothing
| otherwise = Just (getCard bs)
To explain this a bit further, let's look at unfoldr
:
In [18]:
:type Seq.unfoldr
The first argument is a function which takes a value (of type b
), and returns a value
with a
Maybe
type.
This is used in Haskell to represent optional values, since there are two cases:
Nothing
, to represent the empty (or missing, or "null", or ...) caseJust v
, which indicates that there is a result, with a value v
In this case, the function uses Nothing
to indicate that the recursion is to stop, otherwise
it returns a pair of values as Just (a,b)
, where a
is the value to store and b
is the
new "seed" value (i.e. input to the next call of the function).
Let's try with a simple example: a routine that is given an integer and returns the value converted to a string, along with the initial value reduced by 1. To indicate it has finished, the routine will stop (i.e. return Nothing
) if the integer value is less than 1.
In [19]:
silly :: Int -> Maybe (String, Int)
silly a = if a < 1 then Nothing else Just (show a, a-1)
Here's several examples of its behavior:
In [20]:
silly 5
silly 1
silly 0
When used with unfoldr
, we get a routine that takes an integer and returns a sequence of strings:
In [21]:
:type Seq.unfoldr silly
Hopefully, if I've been able to explain this, the following output should come as no surprise (well, apart from
the
fromList
part,
which is part of the
Show
representation of a Sequence):
In [22]:
Seq.unfoldr silly 5
With that digression out of the way, let's get back to some parsing. As the data comes in
chunks of 2880 bytes, let's just try with the first chunk
(take
returns the first n
bytes of the string):
In [23]:
cards = getCards (B8.take chunk cts)
:type cards
In [24]:
Seq.length cards
I can use array indexing to access the contents:
In [25]:
Seq.index cards 0
In [26]:
Seq.index cards 1
In [27]:
Seq.index cards 2
As NAXIS=0
, there's no data associated with this block, just metadata. The question is, just how much
metadata? Now, due to the way that the header units are defined, there are three cases for the last card
in a chunk of 2880 bytes:
This means that all I need to do is check the first four characters of the last card: if it's
"END "
or " "
then it's the end of the header data. I can use the
viewr
function to extract the last card (this is one of the advantages of using a sequence
over a Haskell list, namely efficient access to both ends of the data). The return value has the type
ViewR
which can either be empty (EmptyR
), or lcard :> lelem
, where
lelem
is the last element and lcard
are all the cards preceeding lelem
.
In [28]:
lcards :> lelem = viewr cards
With this, the last card of this block is:
In [29]:
lelem
The above call to viewr
- or, more particularly, the deconstruction
of the return value using :>
- is incomplete, because it does not deal
with all-possible values. In this particular case we know that cards
is not empty, so the answer can not be EmptyR
. When given an arbitrary
sequence, the empty case must also be accounted for, and I'll show
code later on that shows how this can be done.
This piece of code is a good example showing how Haskell's laziness can result in surprising behavior. It's not obvious that a function has actually been evaluated, even after a statement like
lcards :> lelem = viewr cards
It's a
bit like poor-old Schrödinger's cat: we won't know the state until the compiler evaluates the
answer ("opens the box"). In the code above, it was only when lelem
was
displayed that we could be sure that the viewr
call had "run".
As an example of this, note what happens when viewr
is called with an empty sequence:
In [30]:
xxx :> yyy = viewr Seq.empty
There's no error at this point. It is only when we do something with one of the values that the error makes itself known:
In [31]:
Seq.length xxx
The Irrefutable pattern
failure means that
it could not match the expected value of xxx :> yyy
with the actual value, which in this case is EmptyR
.
Getting back to the header parsing, the fact that lelem
is all blank indicates that this is the
end of the first block. Let's write a quick function to encode this logic, on the off chance that I may
want to use it again...
In [32]:
hasEndCard :: Seq.Seq Card -> Bool
hasEndCard cards = case viewr cards of
_ :> card -> any (`B8.isPrefixOf` card) [B8.pack " ", B8.pack "END "]
EmptyR -> True -- treat an empty sequence as indicating the end of a block
I use the case
syntax so that I can safely handle an empty input, rather than
causing a run-time error (since they're somewhat frowned upon in Haskell!). If the sequence
is not empty then I use pattern matching to "deconstruct" the return value; that is
I use the syntax _ :> card
to provide a name for the last card and to tell
the compiler I don't care about the preceeding elements in the sequence
(i.e. the "_
" part).
The check on the last card is performed using the any
routine, which has the signature:
In [33]:
:type any
As I want to check whether card
starts with " "
or "END "
, then the function sent
to any
takes a prefix and checks if matches the start of card
, and the array of values
are the prefixes to use. Note that there are several ways this could have been written, for instance
being explicit with the functions (rather than using partial application) and moving the
pack
call into the anonymous function:
any (\prefix -> (B8.pack prefix) `B8.isPrefixOf` card) [" ", "END "]
The last check handles calls when the input sequence is empty,
in which I somewhat-arbitrarily decided to return True
.
Here I was explicit in the case statement; that is, I wrote
EmptyR -> True
but you will often see the "default" case written as
_ -> True
A quick check that it works as expected:
In [34]:
hasEndCard cards
So, we can skip to the next chunk, which will be a header unit for the second block,
but this time there's more than 2880 bytes of metadata
(the drop
call ignores the first chunk
bytes of cts
, so the take
/drop
calls are equivalent to Python's [chunk:2*chunk] array slice syntax):
In [35]:
cards2 = getCards (B8.take chunk (B8.drop chunk cts))
hasEndCard cards2
What I want is a routine that will strip off 2880-byte chunks, convert them to
cards, continuing until an END
card is found. This sounds like an extension
to how getCards
works, in that it requires repeated application of
the routine. It would also be nice if it
returned the unused data, since we will want that when extracting the data values.
In [36]:
getUnits :: B8.ByteString -> (Seq.Seq Card, B8.ByteString)
getUnits = go Seq.empty
where
-- splitAt will work if the input text is smaller than the chunk
-- size, but I include the check to make the code a bit clearer
-- in intent (also, if the file is a valid FITS file then this
-- first check should never be fired).
--
go cards bs | B8.length bs < chunk = (cards, bs)
| otherwise = let (ls,rs) = B8.splitAt chunk bs
newCards = getCards ls
-- the >< operator appends the two sequences together
combined = cards >< newCards
in if hasEndCard newCards
then (combined, rs)
else go combined rs
(units, bdata) = getUnits (B8.drop chunk cts)
In [37]:
Seq.length units
Seq.index units 0
Seq.index units 1
Looking at the data section, we can see it's binary encoded, which is no surprise since
the XTENSION
keyword was set to BINTABLE
. I need access to the header information to
be able to parse this data.
In [38]:
B8.take 20 bdata
So, how do we handle the metadata section? Well, the FITS header can be viewed as a glorified key-value map,
so let's use this as an API (this means that I lose ordering information, and informational keywords
such as HISTORY
and COMMENT
, but they aren't needed to find out how to decode the data section).
First I load up some useful modules, which include the
Map
API,
the
Foldable
module,
and the
isSpace
and
catMaybes
functions.
In [39]:
import qualified Data.Map as Map
import qualified Data.Foldable as Fold
import Data.Char (isSpace)
import Data.Maybe (catMaybes)
Next up is a helper function - rstrip
- and one to
strip out the (key,value) pairs from a card (splitCard
).
The rstrip
function takes advantage of the
unsnoc
function
function that returns a bytestring containing all-but-the-last character,
and the last character. The routine recursively calls
itself until a non-whitespace character is found. I believe that the name
unsnoc
comes
from a "play" on words, in that cons
is used to describe adding an element to the
start of a list (Lisp heritage),
so that the act of adding an element to the end of a list
is the reverse of cons
, namely snoc
.
The un
part then comes from the fact that here we are deconstructing, rather than
creating, the string.
The splitCard
function decides whether a card contains a key/value pair,
returning Nothing
if it doesn't or splitting up the card into
its parts if it does (wrapping the result in Just
). To
make downstream processing easier, the key and value strings
are "cleaned up" (trailing whitespace is removed, so it's fortuitous
that I've just written a routine to do this) as well
as converted from a ByteString
to a String
.
Now, at first glance, splitCard
is inefficent;
for example, it looks like key
and val
are always going
to be evaluated, even when Nothing
is returned. However,
there's three points to consider:
we shouldn't really be worrying about performance until it becomes obvious that it's a problem;
thanks to Haskell's laziness, code is evaluated "on demand", so routines can have very-different performance characteristics compared to the same code written in a language like Python;
the Haskell compiler can perform some rather impressive optimisations (although this is less relevant when running in an interactive environment like this IHaskell notebook).
In [40]:
-- | Remove trailing whitespace from a bytestring.
--
rstrip :: B8.ByteString -> B8.ByteString
rstrip bs = case B8.unsnoc bs of
Just (nbs, c) -> if isSpace c then rstrip nbs else bs
_ -> bs
-- | If a card has a keyword and value (i.e. is of the form
-- 'keyword = value') return the two parts, after stripping
-- out trailing whitespace.
--
splitCard ::
Card
-> Maybe (String, String) -- ^ (keyword, value)
splitCard card =
let (l,r) = B8.splitAt 8 card
(cs,v) = B8.splitAt 2 r
key = rstrip l
val = rstrip (B8.dropWhile isSpace v)
sep = B8.pack "= "
in if cs == sep
then Just (B8.unpack key, B8.unpack val)
else Nothing
We can see what splitCard
does by giving it a card:
In [41]:
Seq.index units 1
splitCard (Seq.index units 1)
The reason for converting to a (key,value) pair is that this is the format
used to populate maps. The map1
map (or dictionary) - which we create
below - contains one keyword,
"testkey"
, whose value is "value"
. The
Map.lookup
function
is used to query the map for a keyword: if it doesn't exist then
Nothing
is returned, otherwise the value is returned wrapped inside a
Just
.
In [42]:
map1 = Map.fromList [("testkey", "value")]
Map.lookup "42" map1
Map.lookup "testkey" map1
The initial version of the conversion routine - i.e. to create the key/value map
from a sequence of header cards - is given below, as cardsToMap2
(the awkward numeric suffix is because I'm "counting down" to the final version
of the routine):
In [43]:
cardsToMap2 :: Seq.Seq Card -> Map.Map String String
cardsToMap2 cs =
let -- step 1
xs :: [Card]
xs = Fold.toList cs
-- step 2
ys :: [Maybe (String,String)]
ys = map splitCard xs
-- step 3
zs :: [(String,String)]
zs = catMaybes ys
in Map.fromList zs -- step 4
The steps have been explicitly broken out and I have included the types of the terms for documentation. The four steps are:
convert from a sequence into a list using
Fold.toList
(the Foldable
module provides a number of generic routines useful for
sequences - i.e. it somewhat contradicts my earlier comment that there's no
common abstraction - but it is quite an "abstract" abstraction, so I
have stayed away from using it in this notebook where possible);
apply splitCard
to each card;
filter out all the Nothing
entries created by splitCard
and extract the values from the Just
elements using the
catMaybes
function (which has a signature [Maybe a] -> [a]
);
convert this list into a map using the
fromList
function.
This function can also be written in one line:
In [44]:
cardsToMap1 :: Seq.Seq Card -> Map.Map String String
cardsToMap1 cs = Map.fromList (catMaybes (map splitCard (Fold.toList cs)))
which I only introduced as it will hopefully make it more obvious what the final version of the
routine is doing (shown below). However, it also highlights a code simplification, since
the pattern catMaybes (map f xs)
can be replaced with mapMaybe f xs
.
The
mapMaybe
function has the signature
In [45]:
import Data.Maybe (mapMaybe)
:type mapMaybe
and will apply the function (here, splitCard
) to every element of the list,
throwing out all the Nothing
values and extracting the successful values
by removing the Just
constructor. This modification was proffered by
the Haskell linter (hlint
),
which also suggested the eta-reduction step earlier.
The final version of the conversion routine is:
In [46]:
cardsToMap :: Seq.Seq Card -> Map.Map String String
cardsToMap = Map.fromList . mapMaybe splitCard . Fold.toList
In this version I have used
point-free syntax
(which - for the purpose of this notebook - means without including parameter names),
partial application,
and
function composition
(the ".
" operator)
to combine the steps. This is all a big digression, since the three versions of the
routine all do the same thing, but I mention it here as this style is
common in Haskell code.
The .
operator is used in two ways in the function cardsToMap
:
to indicate the namespace of a symbol (e.g. that filter
is taken from the
Seq
module);
to combine functions.
In the second case, .
is itself a function, with the following signature
In [47]:
:type (.)
That is, if f
and g
are single-argument functions, then f . g
and g . f
are both single argument functions,
where f . g
means the same as f (g x)
and g . f
means g (f x)
. So, function composition is read right to left. I find that this syntax can help focus on the way information flows through a sequence of functions,
but it can definitely be hard to read, and occasionally
taken way past sensible extremes!
It's time to stop the digressions and actually use the function:
In [48]:
hdr = cardsToMap units
The
size
function returns the number of keys in the map:
In [49]:
Map.size hdr
and I can use lookup
on it:
In [50]:
Map.lookup "XTENSION" hdr
Map.lookup "DOESNOTEXIST" hdr
Since I'm going to want to call lookup
a few times - and I'm lazy - I want a helper routine
that means I don't have to send in the header, and automatically removes the Just
wrapper:
In [51]:
import Data.Maybe (fromJust)
getKey1 key = fromJust (Map.lookup key hdr)
Following my point-free mania, this can be "simplified" to:
In [52]:
getKey = fromJust . flip Map.lookup hdr
This uses the
flip
function to swap the argument order to lookup
, so that I can use eta reduction to avoid
naming the keyword variable, and
fromJust
to extract the value from the Just
constructor. Now, fromJust
is
what is known as a
partial function since it is not
defined for all inputs: that is, if the input is a Nothing
the routine
will raise a run-time error.
This is not normally considered "good form" in Haskell code, but I feel it's okay for
this interactive exploration.
Here's a run through of how the types work out when flip
is used. As with
many of the "tips" I've been showing here, it should be used with care!
In [53]:
:type Map.lookup
:type flip
:type flip Map.lookup
:type flip Map.lookup hdr
:type fromJust . flip Map.lookup hdr
In case you are interested, here is an example of getKey
"blowing up" due to a
missing keyword:
In [54]:
getKey "DOESNOTEXIST"
For a FITS binary table, the important "structural" keywords - i.e. those that define the layout and size of the binary data - are:
In [55]:
getKey "BITPIX"
getKey "NAXIS"
getKey "NAXIS1"
getKey "NAXIS2"
getKey "TFIELDS"
So, there are three fields - i.e. columns - in each row, each row is 12 bytes long, and there are 1070 rows of data. The field (column) information is stored in the following keywords:
In [56]:
getKey "TTYPE1"
getKey "TFORM1"
getKey "TUNIT1"
In [57]:
getKey "TTYPE2"
getKey "TFORM2"
getKey "TUNIT2"
In [58]:
getKey "TTYPE3"
getKey "TFORM3"
getKey "TUNIT3"
So, the three columns are called
ENERG_LO
, ENERG_HI
, and SPECRESP
(the TTYPEn
values);
the data is represented as three 32-bit IEEE-754 floating-point numbers with no packing (the
NAXIS1
and TFORMn
values) and is in big-endian format$^\dagger$. The TUNITn
keys give any units
associated with the column, and are not needed to parse the data. It would be an interesting
experiment to see if the unit value could be used to create a "numerically-typed" variable,
as I used in my
first three notebooks,
and I may come back to this idea at some unspecified point in the future!
$^\dagger$ I'm sure the standard specifies this, but I just tried the various "endian" conversion routines until I got the answers I expected!
For this notebook I am going to use an "array of structures" representation - that is, create a record to represent each row - rather than a "structure of arrays". For this, I need a row type, such as
data Row = Row { energLo :: Float, energHi :: Float, specResp :: Float }
which uses the Haskell
record syntax
to allow the fields of the structure to be named (it's a bit like a C-style struct
or Python's namedtuple
).
However, I'm going to be a little-more general than this, and allow the row to be parameterised by the data type used to store the data; that is:
In [59]:
data Row a = Row { energLo :: a, energHi :: a, specResp :: a } deriving (Eq, Show)
I've told the compiler to
automatically create
instances of the
Eq
and
Show
typeclasses for this type: the latter because it'll be useful when checking that the
parsing has worked and the former for a sanity check I make later on, once I've read in all the data.
In [60]:
instance Functor Row where
fmap f (Row elo ehi spec) = Row (f elo) (f ehi) (f spec)
I manually create
a
Functor
instance
because I'll want that when plotting up the results$^\dagger$.
I could have got the compiler to derive this
code too, by turning on the DeriveFunctor
extension, but I felt it useful to show the code.
The functor instance lets you apply a function to all elements of the row (so just like how
map
lets you apply a function to all elements of a list and it is how I applied
splitCard
to every element in the sequence as part of the cardsToMap
function).
$^\dagger$ This isn't strictly necessary, as it doesn't save much code in this particular example, but why not!
As an example of this functor instance, let's create a row of float values:
In [61]:
testRow :: Row Float
testRow = Row 0.1 0.2 10.0
which can be displayed (taking advantage of the automatically-derived Show
instance):
In [62]:
testRow
The functor instance allows us to convert all the values to a string,
by applying show
to each field using
fmap
:
In [63]:
fmap show testRow
:type fmap show testRow
or even keep the values with a Float
type but changing their
value - e.g. multiplying by two:
In [64]:
fmap (*2) testRow
In [65]:
import Data.Serialize.Get
import Data.Serialize.IEEE754
Let's start simply, by just parsing the first value (so the ENERG_LO
column of the first row)
and checking that it has a value of 0.3
(which is what dmlist
reported its value to be). The simplest
way to do this is to use
runGet
,
which takes a parser and a bytestring, and returns the answer.
For 32-bit floats in big-endian order, the parser is called
getFloat32be
:
In [66]:
runGet getFloat32be bdata
Success! By manually moving through the binary data, I can extract following elements, such
as the ENERG_HI
, and SPECRESP
values from the first row,
and the ENERG_LO
value from the second row:
In [67]:
runGet getFloat32be (B8.drop 4 bdata)
runGet getFloat32be (B8.drop 8 bdata)
runGet getFloat32be (B8.drop 12 bdata)
The values returned by runGet
are preceeded by Right
, which is because
the return value is actually Either String a
, where a
is the type of the
parser (Float
in this case).
The
Either
type,
is - like the Maybe
type - used to handle cases where two different values are needed.
In the Maybe
case this is Nothing
(e.g. "failure") or Just a
; in the Either
case,
the two values are Left a
and Right b
, where the types of a
and b
can be different.
One of the common use cases for Either
values is to handle calculations that can fail,
where the "left side" has a value of Left String
, for the error message, and the "right
side" is used to indicate success, which is what we have here (Right a
). So, we can ignore
for now the Right
prefixes, and concentrate on the numeric values. As a reminder, the
dmlist
output for the three rows was:
ROW ENERG_LO ENERG_HI SPECRESP
1 0.30000001192093 0.31000000238419 4.8743429184
2 0.31000000238419 0.31999999284744 14.8292617798
3 0.31999999284744 0.33000001311302 21.3022918701
which shows that I'm getting the right values (modulo differences in the display of floating-point values).
So, I could use this to parse each element at a time, manually stepping
through the bytestring, but that
seems like a lot of work. The next thing to try is the
runGetState
routine, which also returns the unparsed data. That is:
In [68]:
-- The last argument is the index into the input string at which to start parsing,
-- which is 0 here.
--
Right (v1, bdata1) = runGetState getFloat32be bdata 0
Right (v2, bdata2) = runGetState getFloat32be bdata1 0
Right (v3, bdata3) = runGetState getFloat32be bdata2 0
Right (v4, _) = runGetState getFloat32be bdata3 0
(v1, v2, v3, v4)
Here I've assumed that the parsing can not fail, so it is safe to assume the return value is
of the form Right ...
. If it did fail then we would
see the Irrefutable pattern
error (shown earlier in the discussion of viewr
) when
one of the values was used (in this case, displaying the output of the v1..v4
tuple).
The code above can hardly be called elegant. One solution is to expand the "language" of the parser, so that
it can parse things that we are interested in - in this case a Row Float32
- so that I
can leave runGet
(or runGetState
) to deal with the boring parts.
So, let's look at the type of the getFloat32be
parser:
In [69]:
:type getFloat32be
Hopefully this signature will be somewhat surprising (to the non-Haskellers reading this, anyway) since
what is this Get
structure?
it doesn't seem to have any way to input the bytestring, nor does it have a way of returning the unparsed data!
Well, this is how Haskell deals with context-sensitive computations. In this case, "context" means "the remaining bytestring to parse", but it can also be a store of messages that log a computation, or some piece of state information that can be both read or changed (e.g. the arrangement of a plot) during a computation, or whether a computation has failed, or ... The most obvious context, which I've been using in these notebooks without mentioning it, is I/O (e.g. printing messages to the screen or reading the contents of the file). Haskell bases all these different sorts of computation on the same underlying mathematical abstraction. It can be thought of as a "mini language" inside Haskell, often recognizable by code written in the style
do
a <- someFunc
b <- anotherFunc a
return (a+b)
which we have seen in the previous notebooks (and will see again
below) when creating graphs.
Note that return
in Haskell can be confusing for people with experience in languages such as C, Fortran, or Python. For now, just be aware that return
is one of those places that Haskell is different to other languages.
What the above means is that I can write a parser for a single row of data - which consists of three floats with no padding - with the following code (not that I'd expect you to be able to work this out from my description!):
In [70]:
getRow1 :: Get (Row Float)
getRow1 = do
elo <- getFloat32be
ehi <- getFloat32be
spec <- getFloat32be
return (Row elo ehi spec)
Checking this out with runGet
gives:
In [71]:
runGet getRow1 bdata
and
In [72]:
runGet getRow1 (B8.drop 12 bdata)
Out of interest, here's what happens when the parsing fails (which in this case means "runs out of data", which I simulate by sending in less-than twelve bytes):
In [73]:
runGet getRow1 (B8.take 4 bdata)
The fact that I labelled this routine getRow1
might suggest to you that
I'm going to re-write it, and you would not be wrong. Using yet-more
mathematical abstractions - in this case
applicative functors -
the routine can be written more compactly as:
In [74]:
import Control.Applicative ((<$>), (<*>))
getRow :: Get (Row Float)
getRow = Row <$> getFloat32be <*> getFloat32be <*> getFloat32be
This version resembles the "point-free style" I described earlier, in that it avoids
naming "temporary things". I mention it here in case you see these symbols (<$>
and <*>
) around and wonder what they are (plus, this is how I originally wrote the routine before deciding it was a tad cryptic
to lead with). As an aside, GHC 7.10 came out whilst I was putting together this notebook. One of the changes in this release is
known as the "AMP", which - for the purposes of
this notebook - means that the code I've written may need small tweaks to compile without
warning messages when using GHC version 7.10 or later.
I can now build on getRow
(or, equivalently, getRow1
), to write a parser for
multiple rows, by repeatedly calling getRow
until the parser runs out of
data. Fortunately there's a function -
isEmpty
-
for this check. The resulting parser,
which I based on code from the
binary module documentation,
is
In [75]:
getRows :: Get [Row Float]
getRows = do
empty <- isEmpty
if empty
then return []
else do
row <- getRow
rows <- getRows
return (row:rows)
When run, it checks to see if there's any more data. If not, the result is the empty list, otherwise it parses the current row, then makes a recursive call (parsing the rest of the rows), before combining the two.
This time, when running the parser, I explicitly deal with the error case by
converting it to an
error
, which will (hopefully) provide more useful information than an Irrefutable pattern
error
(although I don't plan on showing this feature off here ;-). In the success case
I strip off the Right
wrapper.
In [76]:
arows = case runGet getRows bdata of
Left msg -> error msg
Right v -> v
In [77]:
head arows
arows !! 1
So, yet-more success. Let's check that there are the correct number of rows (NAXIS2
was 1070):
In [78]:
length arows
Oops. This is because the data blocks are chunked up into units of 2880 bytes, just like the header blocks, so there are excess rows, as 1070 times 12 does not map to a chunk boundary:
In [79]:
(1070 * 12) `divMod` chunk
(1200 * 12) `divMod` chunk
In this particular case the extra values all decode to 0:
In [80]:
arows !! 1069
arows !! 1070
arows !! 1199
If I were doing this properly, I'd send in the correct number of bytes to getRows (in this case NAXIS1
times NAXIS2
bytes) but for now I can just drop the invalid rows:
In [81]:
rows = take 1070 arows
head rows
rows !! 1
rows !! 2
rows !! 3
rows !! 4
It looks like the bins are consecutive; that is, the energHi
field of row n
is the
same as the energLo
field of row n+1
. To see whether this holds for all rows, I
extract the contents of the "low" and "high" fields with calls to
map
,
remove the first element of the "low" array (with
tail
,
which is like Python's [1:]
array slice),
the last element of the "high" array (with
init
,
which is Python's [:-1]
slice),
and then check that the resulting arrays are equal. This only
works because of the
Eq
instance of Row
that I earlier asked the compiler to derive for me (that is,
==
can deal with type [a]
- a list of a
's - if it knows how to compare
a
's).
In [82]:
elos = map energLo rows
ehis = map energHi rows
tail elos == init ehis
Finally, it's time to make a plot of the data. I load in the necessary code to ensure that the plots can get displayed in-line (this just re-uses the code I created in previous notebooks):
In [83]:
import IHaskell.Display
import Graphics.Rendering.Chart.Backend.Diagrams
-- Apparently |> is also defined in the Easy module, so, as I have already imported
-- it from Data.Sequence, I explicitly avoid importing it here.
--
import Graphics.Rendering.Chart.Easy hiding ((|>))
instance IHaskellDisplay (Renderable a) where
display renderable = renderableToSVG renderable 450 300 >>= display . fst
Now I create a function which takes a list of rows and creates a plot. Since the data
is binned - in that the SPECRESP
column is defined over an energy range - then
I want to display it as a histogram. Now, there is support for drawing histograms in
Chart (e.g.
PlotBars
)
but I don't find the interface intuitive, so I am going to
create a plot manually$^\dagger$. I can do this by drawing a line that connects
(elo_0,specresp_0)
, (ehi_0,specresp_0)
,
(elo_1,specresp_1)
, (ehi_1,specresp_1)
, ...
(elo_1069,specresp_1069)
, (ehi_1069,specresp_1069)
.
Ideally the list would start at (elo_0,0)
and end at (ehi_1069,0)
, but I'll leave
that for now. The code also takes advantage of the Functor
instance of the
Row
type that I wrote earlier, to convert the storage type from Float
to Double
(the reason for this is given in the comments).
Note that the plot is defined using "do" notation, although this time there are no
obvious calls to "set" variables (i.e. nothing of the form a <- function
). This is
because the Chart API uses the
lens package
to greatly-reduce the amount of code needed to set the various fields it uses
(such as the various "titles" which I change below). If you're interested in
trying out Chart, I would
use the
Chart examples as a basis,
before looking too deeply into how to use lens.
$^\dagger$ pointers to existing code that already does this are most welcome.
In [84]:
drawARF ins = toRenderable (do
layout_title .= "ARF"
layout_x_axis . laxis_title .= "E (keV)"
layout_y_axis . laxis_title .= "cm^2"
plot (line "" [cs])
)
where
-- At present there's no PlotValue instance for Float types, which means
-- that I can't just send a list of Float values to line (I have a
-- PR on GitHub to address this issue at
-- https://github.com/timbod7/haskell-chart/pull/77,
-- which has now been accepted, but rather than re-build with that
-- version, the easiest solution is to convert the Float values
-- to Double. I could do this manually, but as I have a Functor instance
-- on the Row type, I can convert Row Float to Row Double just with a
-- call to "fmap realToFrac".
--
rs = map (fmap realToFrac) ins
(elos, ehis, arfs) = unzip3 (map (\(Row elo ehi arf) -> (elo,ehi,arf)) rs)
xs = concat (zipWith (\a b -> [a,b]) elos ehis)
ys = concatMap (\a -> [a,a]) arfs
cs = zip xs ys
Note that there is a Functor instance of lists, and it is the same as map
,
which means that I could have said
rs = fmap (fmap realToFrac) ins
but while the compiler can work out which fmap
it needs to use, for this
example I decided to be explicit!
The moment of truth: what does the ARF look like?
In [85]:
drawARF rows
When discussing the k correction, the spectral models were in units of photon/cm$^2$/s/keV. This measures the flux of photons from a source (photon/s) per energy (the /keV term) and per unit area (the /cm$^2$ term). The ARF is given in units of area (cm$^2$), so if you multiply it by the spectral model you get a spectrum in units of photon/s/keV, which - when integrated up over an energy range - gives a flux in units of photon/s. A physical interpretation is that the ARF measures the sensitivity of the telescope as a function of energy. The fact that the ARF is not a constant, and in fact shows significant structure such as a large edge at 2 keV, is one reason why understanding X-ray data (that is, comparing models to data), is not simple.
The Chandra mirrors are large - "human" size, with a mass of about 1500 kg - but, because this is an X-ray telescope, they are not arranged as they would be in an optical telescope (such as the Ritchey–Chrétien design used in the Hubble Space Telescope) which reflects light, but as four concentric cylinders (known as a Wolter Type-I design) which uses grazing-incidence reflection to focus the light:
Image credits: NASA/CXC/D.Berry, available at http://chandra.harvard.edu/resources/illustrations/teleSchem.html
The cylinders have diameters ranging from 65 cm to 1.23 m, but an area of 400 cm$^2$ is equivalent (when thinking about an area of sky measuring the flux) to a circle of radius 11 cm:
In [86]:
sqrt (400 / pi)
This can lead to jealousy when talking to our optical colleagues, who can talk about using telescopes with radii of 4 m and above, and there are plans for 15 m radii in the near future! In defense of the people who build X-ray mirrors, optical telescopes do not use the full mirror area (M$_1$), since the secondary mirror (M$_2$) obstructs some light, and there's generally a big hole in the center of the mirror, in part, because the secondary mirror occludes this area, but mainly because this is where the light is reflected down to the tertiary mirror and/or instruments ($F$ in the image below)!
Image credits: By HHahn (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
So, I guess the summary is: don't judge an Astronomer by the size of her mirror!
As I was writing this notebook, I got to thinking about other ways to display the data.
One obvious choice was a HTML table, which gave me an excuse to play a bit more with
the IHaskell
display code. Fortunately, I have already installed the necessary Haskell packages - in this
case
blaze-html
- so I can go straight
to work. The routines from the
Html5
module are going to do all the heavy lifting of creating the HTML output; all I
need to do is put them together in the right way.
In [87]:
import Text.Blaze.Html5 hiding (head, map)
import Data.Monoid ((<>), mconcat)
As well as the Html code, I have also loaded the <>
operator from
the
Data.Monoid
module. The Monoid typeclass$^\dagger$ represents things that can be combined together,
that is - have an associative operator -
using <>
(well, it actually means a bit more than that, but all I need is the "combine"
operator, which is also known as
mappend).
The
mconcat
function lets you combine a list of values using <>
; that is
mconcat [a_0, a_1, a_2, ... a_n]
is the same as
a_0 <> a_1 <> a_2 <> ... <> a_n
.
$^\dagger$ 10 points to the person in the back of the class - yes, you over there in the floral dress - who just mumbled "oh what, Yet Another Mathematical Abstraction In Haskell!".
As an example, lists can be concatenated together with
++
,
but this form is specific to lists. The generic version - since a list
has a Monoid instance - is <>
; that is, the following two are the same:
In [88]:
[1,3] ++ [2,4]
[1,3] <> [2,4]
To start off, I want to try converting a single Row
into a row of a table
(i.e. create a <tr>
element containing the column data in <td>
elements).
To do this, each element of the row is
converted to Html using toHtml
and then converted to a <td>
element, using the appropriately-named
td
function.
The three <td>
elements are combined together using <>
, which
appends them to form <td>...</td><td>...</td><td>...</td>
. These
are then placed inside a <tr>
element.
In [89]:
rowToHtml (Row elo ehi sp) =
let col val = td (toHtml val)
in tr (col elo <> col ehi <> col sp)
Now, the toHtml
function has an interesting signature:
In [90]:
:type toHtml
which says that it can be given any type, as long as it is an instance
of the
ToMarkup
typeclass.
This constraint - indicated by the text to the left of the =>
in the signature,
"infects" the rowToHtml
signature, since we have:
In [91]:
:type rowToHtml
This makes sense, since it says that rowToHtml
can only be used on a Row
whose
type can be converted to Html. Fortunately the Float
type is an instance of the
ToMarkup
class. You can find this out from the documentation, either of the
ToMarkup
class or Float
- e.g. (the :opt
line is just to tell the notebook
to include the information inline, so it's visible in the HTML version of the page,
but it appears that it may not always display particularly wel):
In [92]:
:opt no-pager
In [93]:
:info Float
However, the easiest way is to just try it and see if it compiles, or you get an error back
(for example, earlier when I tried len / 2880
and got an error about a missing
Fractional
constraint)!
Applying it to the last row produces:
In [94]:
rowToHtml (last rows)
Since the Html
type is an instance of the IHaskellDisplay
type class, the
notebook has automatically interpreted this for us, which is why we can't
see the <td>
and <tr>
elements.
In fact, I can make the Row
type an instance of the ToMarkup
typeclass using this
function:
In [95]:
instance ToMarkup a => ToMarkup (Row a) where
toMarkup = rowToHtml
which means that I can now say:
In [96]:
toMarkup (last rows)
Due to the way the code is setup, if you can call toMarkup
on a type you can
also call toHtml
:
In [97]:
toHtml (last rows)
What happens if I try to convert several rows?
In [98]:
toMarkup (take 3 rows)
In other words, there is no instance of ToMarkup a => ToMarkup [a]
,
which makes sense because just because I know how to convert a single
element to Html, there's no single way to combine them. This suggests
that, semantically, I want a data type that represents a "table", so that
it makes sense to say "oh, when converting this to Html I want to
include a table header as well as the data".
This leads me to defining an ARF
type, based on a list of Row
values
(I am also going to restrict the row type to be Float
):
In [99]:
newtype ARF = ARF [Row Float]
A value with a type of ARF
is created by using the ARF
label (constuctor),
but as shown below, the notebook doesn't know what to do with it (because I
did not tell the compiler to create a Show
instance for the type, as I
did with the Row
type):
In [100]:
ARF rows
A Show
instance could be added easily - for instance:
instance Show ARF where
show (ARF rows) = "ARF " ++ show rows
but we're here to create a HTML table! Since there's a lot of rows - over a thousand of them - I want
to be able to show only a subset of them in this table. One simple interface is to use a Maybe Int
as an indicator:
Nothing
means to display all rowsJust n
means only show the first and last n
rowsalthough other options are possible. This suggests the following function, which
splits an input list up into the start and end subsets, if a subset is asked
for (i.e. the count is Just n
) and the list is long enough. In this case I
use the Either
type to return a value, not because the Left
case should
be considered a failure, but because I want to return either the full list
(Left
) or the start and end subsets (Right
). The reason for splitting the
two cases will hopefully become clear soon$^\dagger$.
$^\dagger$ If you can't wait, it's because I want to add a row of "..." characters to indicate when rows have been excluded.
In [101]:
subsetRows ::
Maybe Int -- ^ how many rows to select from start and end, if any
-> [a] -- ^ the input list
-> Either [a] ([a],[a])
subsetRows mcount xs =
let n = length xs
in case mcount of
Just nr | n > 2 * nr -> let start = take nr xs
end = drop (n-nr) xs
in Right (start, end)
_ -> Left xs
There are actually two cases above that result in Left xs
being returned:
mcount
is Just nr
, but n <= 2 * nr
mcount
is Nothing
I use the "catch-all" _
case to handle both of these (this is in contrast
to the earlier case when I called viewr
but explicitly named both patterns
of the case
statement).
Checking this gives the expected (or, hopefully you expect them) results:
In [102]:
subsetRows Nothing [1..10]
In [103]:
subsetRows (Just 3) [1..10]
In subsetRows
I did not assume anything about the contents of the list (in otherwords, the list type a
was not constrained). This is often "a good thing™", since it means that
That is, with a type like [a] -> [a]
, we know that the function can either return the empty list or values drawn
from the input list (maybe repeated), but it can not return a value not in the input list (since, without knowing the type, it can not "create" a value). This can be compared to a function like [Int] -> [Int]
which is free to replace every even value by 2 and odd value by the length of the list.
Eventually I do need to write a function with a specific type! In this case, I want something that
will create the <tr>
elements for the HTML table, marking any excluded rows by using a
row of ellipses - i.e. a row like
In [104]:
toHtml (Row "..." "..." "...")
but using the UTF-8 ellipsis character rather than three dots.
Using subsetRows
I can convert the
output rows into HTML using toHtml
, and insert a row of ellipsis if the return
value was Right
. I could do this just for Row Float
, but let's keep trying to
be general and allow any row with a ToMarkup
-compatible type:
In [105]:
htmlSubset :: ToMarkup a => Maybe Int -> [Row a] -> Html
htmlSubset mcount xs =
let out = case subsetRows mcount xs of
Left ys -> map toHtml ys
Right (ls,rs) -> let hellip = "…"
spacer = toHtml (Row hellip hellip hellip)
in map toHtml ls <> [spacer] <> map toHtml rs
in mconcat out
I used <>
to combine the lists but I could have also used ++
here.
Since each element of out is Html
, and Html
is a Monoid, I can use
mconcat
to combine them together, so that the return is Html
rather than
[Html]
.
There's a subtle point in the above for the newcomer to Haskell: why did I write
map toHtml ls <> [spacer] <> map toHtml rs
rather than
map toHtml (ls <> [spacer] <> rs)
that is, combine ls
, spacer
, and rs
into a list first? The reason is
because the types don't match up: ls
and rs
both have type [Row a]
,
but spacer
is the output of a toHtml
call, and so has type Html
. The
compiler knows this, and won't let you combine lists of different types.
So, I convert the ls
and rs
values to Html
, and then I can combine
the three terms.
As a check it works (although it isn't displayed particularly nicely, as there is no
<table>
around the <tr>
's created by the routine):
In [106]:
htmlSubset (Just 2) rows
The "header" row can be created manually (I'd probably not pull this out
into a separate routine, other than in a notebook like this). It's somewhat
ugly since there's a lot of type conversion going on (i.e. all the
calls to toHtml
). There are ways to avoid this - in particular the
OverloadedStrings
GHC extension,
but there are times when it's worth being explicit (if only
to point out where some of the "warts" in a lanugage are):
In [107]:
arfHeader :: Html
arfHeader =
let c1 = toHtml "E" <> sub (toHtml "low") <> toHtml " (keV)"
c2 = toHtml "E" <> sub (toHtml "high") <> toHtml " (keV)"
c3 = toHtml "ARF (cm" <> sup (toHtml "2") <> toHtml ")"
in tr (th c1 <> th c2 <> th c3)
arfHeader
These can then be put together to create a table:
In [108]:
-- | Convert a list of rows into a HTML table.
--
--
arfToHtml ::
Maybe Int -- ^ if `Nothing`, display all rows, otherwise just this number at the start and end
-> ARF -- ^ data to display
-> Html -- ^ HTML represenation
arfToHtml mcount (ARF rows) =
table (thead arfHeader <> tbody (htmlSubset mcount rows))
Testing on a subset of the rows gives (the output may not appear quite correctly, in that horizontal and vertical lines may be missing, depending on how you are viewing this notebook):
In [109]:
arfToHtml Nothing (ARF (take 5 rows))
In [110]:
arfToHtml (Just 1) (ARF (take 5 rows))
With this routine, I can finally create an instance of IHaskellDisplay
for
the ARF
type, where I choose to use a display of 3 rows:
In [111]:
instance IHaskellDisplay ARF where
display = display . arfToHtml (Just 3)
and use this to display the ARF in tabular form:
In [112]:
ARF rows
This could be extended to include metadata about the ARF, but that would require extracting
the relevant information from the header (i.e. parsing the value strings to separate out the data from
the comment string), and enhancing the ARF
data type to store this information. Perhaps next time...
There you go; I hope you enjoyed it. If you have any questions, then please use the GitHub issues page or contact me on Twitter at https://twitter.com/doug_burke.